荷兰专利NL2028258A Attention-lstm-based method for knowledge reasoning of reinforcement learning agent

专利PDF首页>>荷兰专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
An attention—LSTM—based method for knowledge reasoning of a reinforcement learning agent is provided. Reasoning paths of the knowledge graph can be effectively memorized through the two—way long short—term memory network. Moreover, the attention mechanism assigns weights to the memorized path states to obtain the state that needs attention, suppress the invalid state, and screen the memorized paths, which effectively solves the problem that the knowledge reasoning of the reinforcement learning agent cannot effectively memorize the reasoning path. In the present invention, the relation path features are extracted by using the LSTM—attention network model in the reinforcement learning agent, while optimizing the reward mechanism, thereby effectively improving the reasoning accuracy of the reinforcement learning—based knowledge graph reasoning algorithm applied to multiple benchmark data sets.
公开号:NL2028258A
申请号:NL2028258
申请日:2021-05-20
公开日:2021-08-17
发明作者:Liu Hui；Wang Yinglong；Liu Hao；Shu Minglei；Chen Chao
申请人:Shandong Artificial Intelligence Inst；
IPC主号:

专利说明:

-1- ATTENTION-LSTM-BASED METHOD FOR KNOWLEDGE REASONING OF REINFORCEMENTLEARNING AGENT
TECHNICAL FIELD The present invention relates to the technical field of reinforcement learning and deep learning, and in particular to an attention-Long Short-Term Memory (LSTM)-based method for knowledge reasoning of a reinforcement learning agent.
BACKGROUND An automatically built knowledge graph and a manually built knowledge graph both face issues such as incompleteness, lack of knowledge and incorrectness of instances, and thus are difficult to apply to vertical search, question answering systems and other similar fields. To solve these issues, in one of the solutions, effective knowledge multi-hop reasoning is performed on the knowledge graph for knowledge graph completion, link prediction, and instance correctness judgment. In the prior art, knowledge reasoning techniques based on first-order logic rules are only suitable for one-hop paths. Knowledge reasoning techniques based on random walk ranking are not suitable for large-scale knowledge graphs. Knowledge reasoning techniques based on feedforward neural network {FNN) reinforcement learning agent cannot effectively memorize reasoning paths.
SUMMARY In order to overcome the above-mentioned technical shortcomings, the present invention provides a method for effectively improving the reasoning accuracy of a reinforcement learning-based knowledge graph reasoning algorithm applied to multiple benchmark data sets.
To overcome the technical problems, the present invention adopts the following technical solutions. An attention-LSTM-based method for knowledge reasoning of a reinforcement learning agent includes the following steps: a) loading a training set, a validation set, and a test set for triples in a knowledge graph, and performing a preprocessing operation on the data sets; b} loading a knowledge embedding model, and obtaining a word vector representation of the data sets;
c} defining a reinforcement learning environment for an interaction between the reinforcement learning agent and an evaluation function, initializing the environment, and defining an interaction function; d) building a two-way LSTM network model, setting parameters of the LSTM model, adding an attention mechanism to an output of the LSTM model, and adding attention weight parameters to all relations; e) constructing the agent according to the network model in step d}, and inputting the word vector in step b) into the LSTM model to obtain probability estimates of all adjacency relations; f) performing an iteration according to a time step, and calculating a derivative of an accumulated result obtained after the iteration to obtain updated parameters of the model network; and g) after an entity pair (e,,e,) is given, reasoning an accuracy of a path between the entity pair according to a mean reciprocal rank (MRR) and an Ait, evaluation model.
Further, the preprocessing operation in step a) includes: collecting training reasoning path information, collecting test reasoning path information, and tokenizing entity relations. Further, in step b}, an embedded word vector representation of the entity relations in the triples is obtained through OpenKE-based TransH, TransE, TransR, DistinctMult, and CompLEx embedding models, and each entity and relation is mapped into a dense continuous word vector.
Further, in step ¢}, a state transition equation at a time { is defined by formula , P(S, =S | S, =S, max( A) = a) , wherein P represents the probability of selecting one s’ at a time {+1, s’ represents a state variable at the time {+1, a represents a maximum probability relation selected according to a state .S, at the time 1, Se represents a state at the time /+1, § represents an entity associated with a, and A, =P oft max (a | 0) , wherein Ó represents a network model parameter, and a reward function R(s,) is defined by formula
-3- F Jif Cond Te arg et R(s, ) 0if Cond TE oanswer } , wherein € ond represents an end F if Cond & € ource 3 € arg et 3 entity of path reasoning of the relations, c, arger represents a target entity of the path reasoning of the relations, € source represents a set of entities in a given training path, € answer indicates that no node is found during the path reasoning, F, represents a positive reward value, and r_ represents a negative reward value.
Further, the parameters of the LSTM model in step d} include an output dimension, a hidden layer dimension, training epochs, test epochs, a batch, a maximum step size, a learning rate, a weight decay, a gamma optimizer, a beta optimizer, a Lambda optimizer, and an Adam optimizer.
Further, in step e), a maximum probability estimate relation is selected according to the state transition equation, the maximum probability estimate relation is evaluated according to the reward function, and an accumulated reward value J (0) is calculated by formula JO)=E,_, so) E, per ong ™~ Tg R, 4, | = ) | > R(s, | ess a) , t aed wherein { represents the time step, a represents an action relation, Ra represents a reward value under a state space Ss; and a relation action a, at the time step 7, Rs, | es ° a,) represents a reward value when the state Si is reached under the state space 3; and the relation action C}, zal s;8) isa strategic function representing all action relations a under the network model parameter 3 and the state §, Cond ~ Tn represents an action relation selected at each step in the strategic function, F represents performing an iterative operation according to a subscript function,
-4- A represents a set of all possible relations in the action relation space, € represents a source entity LE, of the path reasoning, and 7, represents the strategic function with the network model parameter 6. Further, in step f), more sequential parameters of the model network are obtained by calculating according to formula V,J(0) ~ > R(s, | > av, log 7 (4, | S, ) ‚ t wherein Vr represents calculating a derivative of the model parameter 9, Vo log Ty (a | s,) represents a derivative of the strategic function at the time ¢, R(s, | e.a, ) represents a reward at each time step, and €, represents the target entity E arg et of the path reasoning.
The advantages of the present invention are as follows. Reasoning paths of the knowledge graph can be effectively memorized through the two-way long short-term memory network. Moreover, the attention mechanism assigns weights to the memorized path states to obtain the state that needs attention, suppress the invalid state, and screen the memorized paths, which effectively solves the problem that the knowledge reasoning of the reinforcement learning agent cannot effectively memorize the reasoning path. In the present invention, the relation path features are extracted by using the LSTM-attention network model in the reinforcement learning agent, while optimizing the reward mechanism, thereby effectively improving the reasoning accuracy of the reinforcement learning-based knowledge graph reasoning algorithm applied to multiple benchmark data sets.
DETAILED DESCRIPTION OF THE EMBODIMENTS The present invention will be further explained below.
An attention-LSTM-based method for knowledge reasoning of a reinforcement learning agent includes the following steps: a) a training set, a validation set, and a test set are loaded for triples (entity, relation, entity, ) in a knowledge graph, and a preprocessing operation is performed on the data sets; b} a knowledge embedding model is loaded, and a word vector representation of the data sets is obtained;
-5-
c} a reinforcement learning environment for an interaction between the reinforcement learning agent and an evaluation function is defined and initialized, and an interaction function is defined; d) a two-way LSTM network model is built, parameters of the LSTM model are set, an attention mechanism is added to an output of the LSTM model, and attention weight parameters are added to all relations; e) the agent is constructed according to the network model in step d), and the word vector in step b) is input into the LSTM model to obtain probability estimates of all adjacency relations; f) an iteration is performed according to a time step, and a derivative of an accumulated result obtained after the iteration is calculated to obtain updated parameters of the model network; and g) after an entity pair (e,,e,) is given, an accuracy of a path between the entity pair is reasoned according to a mean reciprocal rank (MRR) and an Ait, evaluation model.
It has been tested that a better reasoning accuracy is achieved in a typical data set (NELL-995,
FB15K-237) that is only a multi-hop relation path reasoning task of the graph.
In terms of the multi-hop relation path reasoning ability, the MRR generally uses an evaluation mechanism for search algorithms, and hit, represents the top 10 reasoning prediction scores.
Reasoning paths of the knowledge graph can be effectively memorized through the two-way long short-term memory network.
Moreover, the attention mechanism assigns weights to the memorized path states to obtain the state that needs attention, suppress the invalid state, and screen the memorized paths, which effectively solves the problem that the knowledge reasoning of the reinforcement learning agent cannot effectively memorize the reasoning path.
In the present invention, the relation path features are extracted by using the LSTM- attention network model in the reinforcement learning agent, while optimizing the reward mechanism, thereby effectively improving the reasoning accuracy of the reinforcement learning-based knowledge graph reasoning algorithm applied to multiple benchmark data sets,
Embodiment 1 The preprocessing operation in step a) includes: collecting training reasoning path information, collecting test reasoning path information, and tokenizing entity relations.
Embodiment 2 In step b), an embedded word vector representation of the entity relations in the triples is obtained through OpenKE-based TransH, TransE, TransR, DistinctMult, and CompLEx
-6- embedding models, and each entity and relation is mapped into a dense continuous word vector.
Embodiment 3 In step c}, a state transition equation at a time { is defined by formula ’ PS, =S | S, =S, max( A4) = a) , wherein P represents the probability of selecting one 5’ at a time {+1, s’ represents a state variable at the time +1, a represents a maximum probability relation selected according to a state .S, at the time 1, Ss represents a state at the time /+1, s represents an entity associated with a, and 4, =P, imax (a | 0) , wherein @ represents a network model parameter, and a reward function R(s,) is defined by formula F Jif Cond Te arg et R(s, ) 0if Cond TE oanswer j , wherein E ond represents an end F if Cond & € ource 3 € arg et 3 entity of path reasoning of the relations, c, arger represents a target entity of the path reasoning of the relations, € source represents a set of entities in a given training path, € answer indicates that no node is found during the path reasoning, F, represents a positive reward value, and r_ represents a negative reward value.
Embodiment 4 The parameters of the LSTM model in step d) include an output dimension, a hidden layer dimension, training epochs, test epochs, a batch, a maximum step size, a learning rate, a weight decay, a gamma optimizer, a beta optimizer, a Lambda optimizer, and an Adam optimizer.
In step e}, a maximum probability estimate relation is selected according to the state transition equation, the maximum probability estimate relation is evaluated according to the reward function, and an accumulated reward value 9) is calculated by formula
-7- JOE, (so) Eon) R | = ) | ) Rs, | es a,) , t aed wherein I represents the time step, ¢ represents an action relation, Ra represents a reward value under a state space S, and a relation action a, at the time step /, R(s,,, | es > a,) represents a reward value when the state A is reached under the state space 5; and the relation action 4; x(a|s;8) isa strategic function representing all action relations a under the network model parameter Ó and the state s, Cena Tg represents an action relation selected at each step in the strategic function, £ represents performing an iterative operation according to a subscript function, A represents a set of all possible relations in the action relation space, €, represents a source entity E, of the path reasoning, and Ty represents the strategic function with the network model parameter 6.
Embodiment 5 In step f), more sequential parameters of the model network are obtained by calculating according to formula V,J (0) ~ D> R(s, | es a,)V, log 7,(q, | s,) , wherein { V, represents calculating a derivative of the model parameter 6, V, log Ty (a, | s,) represents a derivative of the strategic function at the time Z, R(s, | € a,) represents a reward at each time step, and €, represents the target entity Eger of the path reasoning. The formula indicates that the accumulation of the reward value and the gradient of the strategic network at each time step is approximately the derivative of the accumulated reward.

权利要求:
Claims (7)
[1]
-8- Conclusions l. Attention-LSTM-based knowledge reasoning method of a reinforcement learning resource, comprising the steps of: a) loading a training set, a validation set, and a test set for triplets into a knowledge graph and performing a preprocessing operation on the data sets; b) loading a knowledge embedding model and obtaining a word vector representation of the data sets; c) defining a reinforcement learning environment for an interaction between the reinforcement learning agent and an evaluation function, initializing the environment and defining an interaction function; d) building a two-way LSTM network model, setting parameters of the LSTM model, adding, to an execution of the LSTM model, an attention mechanism, and adding attention parameters to all relationships; e) constructing the resource according to the network model in step d) and inputting the word vector in step b) into the LSTM model to obtain probability estimates of all adjacent relationships; f) performing an iteration according to a time step and calculating a derivative of an accumulated result obtained after the iteration to obtain updated parameters of the model network; and g) after an entity pair (e1, ez) is given, reasoning an accuracy of a path between the entity pair according to a mean reciprocal rank (MRR) and a hif1o evaluation model.
[2]
Attention-LSTM-based knowledge reasoning method of the reinforcement learning tool according to claim 1, characterized in that the pre-processing operation in step a) comprises: collecting information about the training reasoning path, collecting information about the test reasoning path, and making entity relationships into signs.
[3]
Attention-LSTM-based knowledge reasoning method of the reinforcement learning tool according to claim 1, characterized in that in step b), an embedded word vector representation of the entity relationships in the triplets via
-9- OpenKE-based TransH, TransE, TransR, DistinctMult and CompLEx embedding models are obtained, plotting each entity and relationship in a dense continuous word vector.
[4]
Attention-LSTM-based knowledge reasoning method of the reinforcement learning tool according to claim 1, characterized in that in step c), a state transition equation at a time + is defined by formula PS =S" SS, =s,max(4 ) =a) , where P represents a probability of selecting one s' at a time /+1, where s' represents a state variable at time +1, where ¢ is a maximum probability relationship selected according to a state S, on represents time £, where $, represents a state at time /+1, where § represents an entity associated with {, and 4, =P ort mas (a | 0) ‚ where # represents a network model parameter, and a reward function R(s,) is defined by a formula F If Cond re, arget R(s í ) — 0if Cond € oanswer 15 . , where Cond represents an end entity F if Cond a € ree 9 € ager § of path reasoning of the relationships , where €, arget is a target entity of the path reasoning of the represents relationships, where €source represents a set of entities in a given training path, where €answer indicates that no node is found during path reasoning, where F. represents a positive reward value and where €answer represents a negative reward value.
[5]
5. Attention-LSTM-Based Method for Knowledge Reasoning of the
reinforcement teaching tool according to claim 1, characterized in that the parameters of the LSTM model in step d) are an execution size, a hidden layer size, training periods, test periods, a load, a maximum step size, a learning rate, a weight decay, a gamma- optimizer, a beta optimizer, a Lambda optimizer and an Adam optimizer.
[6]
The attention-LSTM-based knowledge reasoning method of the reinforcement learning tool according to claim 4, characterized in that in step e), a maximum probability estimation relationship is selected according to the state transition equation, the maximum probability estimation relationship is evaluated according to the reward function and an accumulated reward value J (Q ) is calculated by formula J(O)=E, cs) Epon R, | = > ) Rs, | Es» a,) i t ae4 where I represents the time step, where @ represents an action relationship, where B represents a reward value under a state space AY and a relationship action a, at the time step I, where RS, | Ê ° a) represents a reward value if the state S41 is reached under the state space 5, and the relationship action a, , where 7 (a|$:0) is a strategic function that represents all action relationships ¢ under the network model parameter 9 and the state § , where ¢&,....d,,,;~7Ty represents an action relationship selected at each step in the strategic function, where / represents performing an iterative action according to a subscript function, where 4 represents a set of all possible represents connections in the action context space, where €, represents a source entity E, of the path reasoning and where 77, represents the strategic function with the model parameter network #.
“11 -
[7]
Attention-LSTM-based knowledge reasoning method of the reinforcement learning tool according to claim 4, characterized in that in step f), more successive parameters of the model network are obtained by calculating according to the formula VJ(0) > DRGs, | ea,)V, log Ty (4, | S, ) ‚ where 7 V, represents the calculation of a derivative of the model parameter 9, where Vo log (4, | Ss) represents a derivative of the strategic function on time { , where R(s, | eda 2) represents a reward at each time step and where €, the target entity E, represents arge of the path reasoning.

类似技术:

公开号 | 公开日 | 专利标题

Vu et al.2011|Continuous-time regression models for longitudinal networks

Ajiboye et al.2015|Evaluating the effect of dataset size on predictive model using supervised learning technique

US10289471B2|2019-05-14|Ranking causal anomalies via temporal and dynamical analysis on vanishing correlations

Erdogan et al.2013|Inverse propagation of uncertainties in finite element model updating through use of fuzzy arithmetic

CN105608004A|2016-05-25|CS-ANN-based software failure prediction method

NL2028258A|2021-08-17|Attention-lstm-based method for knowledge reasoning of reinforcement learning agent

Deng et al.2011|Fast automatic two-stage nonlinear model identification based on the extreme learning machine

Hanson1999|A framework for assessing uncertainties in simulation predictions

Jayabal et al.2014|Clustering students based on student’s performance-a partial least squares path modeling | study

Nikolaidis et al.2000|Neural networks and response surface polynomials for design of vehicle joints

Bakari et al.2016|Application of Newton-Raphson method to non-linear models

US20150024367A1|2015-01-22|Cost-aware non-stationary online learning

Rizal et al.2016|Recurrent neural network with Extended Kalman Filter for prediction of the number of tourist arrival in Lombok

WO2021164250A1|2021-08-26|Turbulence field update method and apparatus, and related device

CN111079931A|2020-04-28|State space probabilistic multi-time-series prediction method based on graph neural network

Wang et al.2007|A new interactive model for improving the learning performance of back propagation neural network

Singh et al.2010|Predicting software development effort using artificial neural network

Bidhan et al.2014|Estimation of reliability parameters of software growth models using a variation of Particle Swarm Optimization

Dahl2007|Convergence of random k-nearest-neighbour imputation

Toscano-Palmerin et al.2016|Stratified bayesian optimization

Saini et al.2012|Soft computing particle swarm optimization based approach for class responsibility assignment problem

Sellam et al.2021|Time series modelling and forecasting for predicting Covid19 Case Load using LSTM

Baughman et al.2019|Measuring, quantifying, and predicting the cost-accuracy tradeoff

Krivtsov et al.2017|Regularization techniques for recurrent failure prediction under Kijima models

Noekhah et al.2017|Applying Neural Network Approach with Imperialist Competitive Algorithm for Software Reliability Prediction

同族专利:

公开号 | 公开日

CN112116069A|2020-12-22|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

CN111160557A|2019-12-27|2020-05-15|浙江大学|Knowledge representation learning method based on double-agent reinforcement learning path search|

CN113688217A|2021-08-24|2021-11-23|山东省人工智能研究院|Intelligent question and answer method oriented to search engine knowledge base|

法律状态:

优先权:

申请号 | 申请日 | 专利标题

CN202010918363.9A|CN112116069A|2020-09-03|2020-09-03|Attention-LSTM-based reinforcement learning Agent knowledge inference method|

[返回顶部]